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Abstract 

Graph-constrained estimation methods encourage similarities among neighboring covari¬ 
ates presented as nodes on a graph, which can result in more accurate estimations, especially 
in high dimensional settings. Variable selection approaches can then be utilized to select a 
subset of variables that are associated with the response. However, existing procedures do 
not provide measures of uncertainty of the estimates. Moreover, the vast majority of existing 
approaches assume that available graphs accurately capture the association among covari¬ 
ates; violating this assumption could severely hurt the reliability of the resulting estimates. 
In this paper, we present an inference framework, called the Grace test, which simultaneously 
produces coefficient estimates and corresponding p-values while incorporating the external 
graph information. We show, both theoretically and via numerical studies, that the pro¬ 
posed method asymptotically controls the type-I error rate regardless of the choice of the 
graph. When the underlying graph is informative, the Grace test is asymptotically more 
powerful than similar tests that ignore external information. We further propose a more 
general Grace-ridge test that results in a higher power than the Grace test when the choice 
of the graph is not fully informative. Our numerical studies show that as long as the graph 
is reasonably informative, the proposed testing methods deliver improved statistical power 
over existing inference procedures that ignore external information. 

Keywords — Biological networks; Graph-constrained estimation; High-dimensional data; 
Significance test; Variable selection. 


1 Introduction 

Interactions among genes, proteins and metabolites shed light into underlying biological 

mechanisms, and clarify their roles in carrying out cellular functions (Zhu et ah, 2007; 

Michailidis, 2012). This has motivated the development of many statistical methods to 
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incorporate existing knowledge of biological networks into data analysis (see e.g. Kong 
et ak, 2006; Wei and Pan, 2008; Shojaie and Michailidis, 2009, 2010b). Such methods 
can lead to identihcation of novel biological mechanisms associated with the onset and 
progression of complex diseases (see e.g. Khatri et ah, 2012). 

External network information may be summarized using an undirected weighted 
graph G = (K, E, hP), whose node set V = {l,...,p} corresponds to p covariates. 
The edge set E of the graph encodes similarities among covariates, in the sense that 
two vertices u,v E V are connected with an edge e = (m ~ n) G E if covariates u 
and V are “similar” to each other. The similarity between neighboring nodes {u ~ 
v) is captured by weights w{u,v). Such similarities can for instance correspond to 
interactions between genes or phylogenetic proximities of species. 

A popular approach for incorporating network information is to encourage smooth¬ 
ness in coefficient estimates corresponding to neighboring nodes in the network using 
a network smoothing penalty (Li and Li, 2008; Slawski et ah, 2010; Pan et ah, 2010; 
Li and Li, 2010; Huang et ah, 2011; Shen et ah, 2012). This approach can also be 
generalized to induce smoothness among similar covariates dehned based on a distance 
matrix or “kernel” (Randolph et ah, 2012) which, for instance, capture similarities 
among microbial communities according to lineages of a phylogenetic tree (Fukuyama 
et ah, 2012). 

The smoothness induced by the network smoothing penalty can result in more 
accurate parameter estimations, particularly when the sample size n is small compared 
to the number of covariates p. Sparsity-inducing penalties, like the G penalty (Li and 
Li, 2008, 2010) or the minimum convex penalty (MCP) (Huang et ah, 2011), can then 
be used to select a subset of covariates X associated with the response y for improved 
interpretability and reduced variability. It has been shown that, under appropriate 
assumptions, the combination of network smoothing and sparsity-inducing penalties 
can consistently select the subset of covariates associated with the response (Huang 
et ah, 2011). However, such procedures do not account for the uncertainty of the 
estimator, and in particular, do not provide p-values. 

A number of new approaches have recently been proposed for formal hypothe- 
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sis testing in penalized regression, including resampling and subsampling approaches 
(Meinshausen and Biihlmann, 2010), ridge test with deterministic design matrices 
(Biihlmann, 2013), and the low-dimensional projection estimator (LDPE) for t'l-penalized 
regression (Zhang and Zhang, 2014; van de Geer et ah, 2014). However, there are 
currently no inference procedures available for methods that incorporate external in¬ 
formation using smoothing penalties. Inference procedures for kernel machine learning 
methods (Liu et ah, 2007), on the other hand, test the global association of covariates 
and are hence not appropriate for testing the association of individual covariates. 

Another limitation of existing approaches that incorporate external network infor¬ 
mation, including those using network smoothing penalties, is their implicit assump¬ 
tion that the network is accurate and informative. However, existing networks may 
be incomplete or inaccurate (Hart et ah, 2006). As shown in Shojaie and Michai- 
lidis (2010a), such inaccuracies can severely impact the performance of network-based 
methods. Moreover, even if the network is accurate and complete, it is often unclear 
whether network connectivities correspond to similarities among corresponding coeffi¬ 
cients, which is necessary for methods based on network smoothing penalties. 

To address the above shortcomings, we propose a testing framework, the Grace test, 
which incorporates external network information into high dimensional regression and 
corresponding inferences. The proposed framework builds upon the graph-constrained 
estimation (Grace) procedure of Li and Li (2008), Slawski et al. (2010) and Li and Li 
(2010), and utilizes recent theoretical developments for the ridge test by Biihlmann 
(2013). As part of our theoretical development, we generalize the ridge test with 
hxed design to the setting with random design matrices X. This generalization was 
suggested in the discussion of Biihlmann (2013) as a possible extension of the ridge 
test, and results in improved power compared to the original proposal. 

Our theoretical analysis shows that the proposed testing framework controls the 
type-I error rate, regardless of the informativeness and accuracy of the incorporated 
network. We also show, both theoretically and using simulation experiments, that if 
the network is accurate and informative, the Grace test offers improved power over 
existing approaches that ignore such information. Finally, We propose an extension of 
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the Grace test, called the Grace-ridge or GraceR test, for settings where the network 
may be inaccurate or uninformative. 

The rest of the paper is organized as follows. In Section 2, we introduce the Grace 
estimation procedure and the Grace test. We also formally dehne the “informativeness” 
of the network. Section 3 investigates the power of the Grace test, in comparison to its 
competitors. In Section 4, we propose the Grace-ridge (GraceR) test for robust esti¬ 
mation and inference with potentially uninformative networks. We apply our methods 
to simulated data in Section 5 and to data from The Gancer Genome Atlas (TGGA) 
in Section 6. We end with a discussion in Section 7. Proofs of theoretical results and 
additional details of simulated and real-data analyses are gathered in Section 8. 

Throughout this paper, we use normal lowercase letters to denote scalars, bold 
lowercase letters to denote vectors and bold uppercase letters to denote matrices. We 
denote columns of an n x p matrix X by XjG = and its rows by = 

1, ...,n. For any two symmetric matrices A and B, we denote A ^ B if B — A is 
positive semi-dehnite, or Ao(.B — A) > 0, where Aq denotes the smallest eigenvalue of 
a symmetric matrix. For an index set J, we denote by A(j^j) the | J| x \J\ sub-matrix 
corresponding to the rows and columns indexed by J. Finally, for a p-vector f3, we let 
ll/3|U = (ELi for keZ+ and i|/3||oo = max^/^^. 

2 The Grace Estimation Procedure and the Grace Test 


2.1 The Grace Estimation Procedure 


Let L be the matrix encoding the external information in an undirected weighted 
graph G = {y,E,W). In general, L can be any positive semi-dehnite matrix, or 
kernel, capturing the “similarity” between covariates. In this paper, however, we focus 
on the case where L is the graph Laplacian matrix. 






dn 


-w{u, v) 


0 


if u = V 

if u and v are connected , 
otherwise 
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with du = 'i') denoting the degree of node u. We also assume that weights 

w{u,v) are nonnegative. However, the dehnition of Laplacian and the analysis in this 
paper can be generalized to also accommodate negative weights (Chung, 1997). 

Let X = (a^i,..., Xp) G be the nxp design matrix and y G M"" be the response 
vector in the linear model 

y = X(3* + e, e ~ iV^O, iVj,(0, S) for t = 1,..., n. (1) 

Multivariate normality of covariates is commonly assumed in analysis of biological 
networks, particularly, when estimating interactions among genes or proteins using 
Gaussian graphical models (see e.g. de la Fuente et ah, 2004). Interestingly, the un¬ 
derlying assumption of network smoothing penalties - that connected covariates after 
scaling have similar associations with the response - is also related to the assumption 
of multivariate normality (Shojaie and Michailidis, 2010b). Without loss of generality, 
we assume y is centered and columns of X are centered and scaled, i.e. I/* = 0 

and Er=i jj) = 0, x~^Xj = n for j = 1, ...,p. We denote the scaled Gram matrix by 
S = X^X/n. 

For a non-negative tuning parameter h, Grace solves the following optimization 
problem: 


/3(h) = argmin 111 ^ — X/ 3 II 2 -I-h/3~''X/3| = (nS-I-hX) ^X^y. (2) 

When L is the Laplacian matrix, f3~^Lf3 = v) (Huang et ah, 2011). 

Hence, the Grace penalty (3^LjS encourages smoothness in coefficients of connected 
covariates, according to weights of edges. Henceforth, we call L the penalty weight 
matrix. 

For any tuning parameter h > 0, Equation (2) will have a unique solution if 
(nS -|- hL) is invertible. However, if p > n and rank{L) < p this condition may 
not hold. With a Gaussian design a;* iVp(0, S), it follows from Bai (1999) that 
if liminf^^oo Ao(S) > 0, and if there exists a sequence of index sets Cn C {l,...,p}. 
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\im.n^^\Cn\/n < 1, such that liminf^^oo Ao(lv(y\c'„,v'\c„)) > 0, then (nS + hL) is 
almost surely invertible. In this section we hence assume that (nS + hL) is invertible. 
This condition is relaxed in Section 4, when we propose the more general Grace-ridge 
(GraceR) test. 

As mentioned in the Introduction, several methods have been proposed to select 
the subset of relevant covariates for Grace. For example, Li and Li (2008, 2010) added 
an penalty to the Grace objective function, 

= argmin jll?/- X/ 3 II 2 -h/i/3^R/3-h/ii||/3||^| . (3) 

Huang et ah (2011) instead added the MGP and proposed the sparse Laplacian shrink¬ 
age (SLS) estimator. While these methods perform automatic variable selection, they 
do not provide measures of uncertainty, i.e. conhdence intervals or p-values. In this 
paper, we instead propose an inference procednre that provides p-valnes for estimated 
coefficients from Eqnation (2). The resulting p-valnes can then be used to assess the 
signihcance of individual covariates, and select a snbset of relevant variables. 


2.2 The Grace Test 


Before introducing the Grace test, we present a lemma that characterizes the bias of 
the Grace estimation procedure. 

Lemma 2.1. For any h > 0, assume (nS -|- hL) is invertible. Then, given X, f3{h) 
as formulated in (2) is an unbiased estimator of f3* if and only if Lf3* = 0. Moreover, 


Bias(/3(h)|X)|l2 < 


h\\L(3*h 

Xo{n'S + hL) 


( 4 ) 


Because the bias of the Grace estimator depends directly on the magnitnde of LjS*, 
we consider L to be informative if Lf3* is small. According to Lemma 2.1, the Grace 
estimator will be unbiased only if (3* lies in the space spanned by the eigenvectors of 
L with 0 eigenvalues. In reality, however, this condition cannot be checked from data. 
Thns, to control the type-I error rate, we must adjust for this potential estimation 
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bias. 

Our testing procedure is motivated by the ridge test proposed in Biihlmann (2013), 
which we briefly discuss next. First, note that ridge is also a biased estimator of /3*, 
and its estimation bias is negligible only if the ridge tuning parameter is close to zero. 
In addition to the estimation bias, Biihlmann (2013) also acconnted for the projection 
bias of ridge regression for a fixed design matrix X. This is because for fixed design 
matrices with p > n, f3* is not nniquely identifiable, as there are infinitely many /3’s 
snch that E(^) = Xf3. Using ridge regression, f3* is only estimable if it lies in the row 
space of X, 71{X), which is a proper subspace of when p > n. If f3* does not lie in 
this subspace, the ridge estimated regression coefficient is indeed the projection of [3* 
onto TZ{X), which is not identical to f3*. This gives rise to the projection bias. 

To acconnt for these two types of biases, Biihlmann (2013) proposed to shrink 
the ridge estimation bias to zero by shrinking the ridge tuning parameter to zero, 
while controlling the projection bias using a stochastic bias bonnd derived from a lasso 
initial estimator. A side effect of shrinking the ridge tuning parameter to zero is that 
the variance of covariates with high multi-collinearity conld become large; this would 
hurt the statistical power of the ridge test. In addition, the stochastic bonnd for the 
projection bias is rather loose. This double-correction of bias farther compromises the 
power of the ridge test. 

In this paper, we develop a test for random design matrices, which was snggested 
in the discussion of Biihlmann (2013) as a potential extension. With random design 
matrices, we do not incnr any projection bias. This is because the regression coefficients 
in this case are nniquely identifiable as S“^Cov(X,y) nnder the joint distribntion of 
{X,y). Here, S denotes the population covariance matrix of covariates and Cov(X, y) 
is the population covariance between the covariates and the response; see Shao and 
Deng (2012) for a more elaborate discussion of identifiability for fixed and random 
design matrices. 

To control the type-I error rate of the Grace test, we adjust for the potential esti¬ 
mation bias using a stochastic bound derived from an initial estimator. By adjusting 
for the estimation bias using a stochastic upper bound, the Grace tuning parameter 
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needs not be very small. Thus, the variances of Grace estimates are less likely to be 
unreasonably large; this results in improved power for the Grace test. Power proper¬ 
ties of the Grace test are more formally investigated in Section 3. Next, we formally 
introduce our testing procedure. 

Gonsider the null hypothesis Hq ■. (3* = Q for some j G {1, ...,p}. Let /3 be an initial 
estimator with asymptotic G estimation accuracy, i.e. \\f3 — (3*\\i = Op{l). The Grace 
test statistic is defined as 


= /3(/i) + h{n± + (5) 

where /3{h) is the Grace estimator from (2) with tuning parameter h. Plugging in (2) 
and adding and subtracting /i(nS -|- hL)~^L(3, we can write 

= l^*j + ’ 3 = 1 , ( 6 ) 


where 


Zf\X ~ N 



{nt + hL)-^t{nt + hL) 





7 ^ = h{nt + hL)-^L{i3 - (3*). 


Next, we derive an asymptotic stochastic bound for 7 ^ such that under the null hy¬ 
pothesis 

I G| ^asy. pG equivaleutly, lim Pr (lyf| < Vf) = 1. (7) 

Then, under the null hypothesis, \zf\ which allows us to asymptotically 

control the type-I error rate. 

To complete our testing framework, we use the fact under suitable conditions and 
with proper tuning parameter hiasso, described in Theorem 2.3, the ii estimation error 
of the lasso, 

P{hLasso) = argmin I -II?/ - X(3\^ + hLasso\\f3\\. \ , 


( 8 ) 



is asymptotically controlled (Biihlmann and van de Geer, 2011). We thus use the lasso 
estimator as the initial estimator for the Grace test, i.e. /3 = l3{hLasso)- Theorem 2.3 
then constructs a T^ that satishes Gondition (7). First, we present required conditions. 

• AO; (nS + hL) is invertible. 

• Al; y = X(5* + e where a;* Ap(0, S) for i = 1,..., n and e ~ iV„(0, (Jg/). 

• A2; Let Sq = {j ■. (3* ^ Q} be the active set of f3* with cardinality Sq = |5'o|. We 
have So = o {^n/ logpj^j for some 0 < .^ < 1/2. 

• A3; The S-compatibility condition (Biihlmann and van de Geer, 2011) in Dehni- 
tion 2.2 is met for the set Sq with compatibility constant liminf^^oo 0|; „ = d > 0, 
where d is a constant. 

• A4; h and L are such that 



+ hL)-^hL 


(ij) 


Op 



Corollary 2.2 (S-Gompatibility Gondition). For an index set S C {l,...,p} with 
cardinality s, define f3^ and such that fij = We say that 

the Ti-compatibility condition is met for the set S with compatibility constant 0s > 0 
if for all f3 eW living in the cone ||/3‘®"'||i < 3||/3'^||i, we have 


Ib'lli < P) 

As discussed in Section 2.1, AO is required for uniqueness of the Grace estimator, and is 
justified by the Gaussian deign. A2 is a standard assumption, and requires the number 
of relevant covariates to not grow too fast, so that the signal is not substantially diluted 
among those relevant covariates. Note that with p = O (exp(n'^)) for some z/ < 1, sq 
can grow to inhnity as n —)■ cx). The S-compatibility condition in A3 is closely related 
to the restricted eigenvalue assumption introduced in Bickel et al. (2009). Assumption 
A4 is made for improved control of type-I error, and can be relaxed at a cost of 
potential loss of power with hnite samples; see Remark 2.2. On the other hand, given 
X and L, when h/n —)■ cx), the eigenvectors and eigenvalues of {n/h)Yl + L converge to 
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the eigenvectors and eigenvalues of L. This indicates that (nS + hL) ^hL converges 
to a diagonal matrix with diagonal entries equal to 0 or 1, and A4 is satished. 

Theorem 2.3. Suppose Assumptions AO - A4 are satisfied, and let /3 = ${hLasso) 
with the tuning parameter hLasso A/logp/n. Let 


q ^ h 





( 10 ) 


(h-j) 

absolute value of entries in row j without the diagonal entry. Then T^ satisfies condi¬ 
tion (7). 

Under the null hypothesis Hq : fij = 0, for any a > we have 


where 


\n± + hL)-^L 


= maxLi^j (nS + hL) is the maximum in 

OO 


lim sup Pr ( I > “) < lim sup Pr [\zf\ + Tf > a) . 


( 11 ) 


Remark If we instead consider 


Tf = h 


\n± + hL)-^L]^. 


/ logp \ " 
OO \ n J 




we can relax Assumption A4 and still control the asymptotic type-I error rate. The¬ 
orem 2.3 can then be similarly proved without A4. However, as h/n —)■ oo, (nS -|- 
hL)~^hL converges to a diagonal matrix, in which case [in's -\- hL)~^hL~^^^ ^ 

in hnite samples. 


> 


. This looser stochastic bound may result in lower power 


Theorem 2.3 shows that regardless of the choice of L, the type-I error rate of the 
Grace test is asymptotically controlled. The stochastic bound PP relies on the unknown 
sparsity parameter Following Biihlmann (2013) we suggest a small value of and 
use ^ = 0.05 in the simulation experiments in Section 5 and real data example in 
Section 6. 
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Using (11), we can test Hq using the asymptotically valid two-sided p-value 


Pf = 2 


$ 



( 12 ) 


where <h is the standard normal c.d.f., and a_|_ = max(a, 0). Calculating p-values 
requires estimating and choosing a suitable tuning parameter h. We can estimate 
a‘1 using any consistent estimator, such as the scaled lasso (Sun and Zhang, 2012). In 
the simulation experiments and real data example, we choose h using 10-fold cross- 
validation (CV). 

Note that, when simultaneously testing multiple hypotheses: Hq : /3* = 0 for any 
j G J C {1, versus Ha '■ P* ^ 0 for some j G J, we may wish to control the 
false discovery rate (FDR). Because covariates in the data could be correlated, test 
statistics on multiple covariates may show arbitrary dependency structure. We thus 
suggest controlling the FDR using the procedure of Benjamini and Yekutieli (2001). 
Alternatively, we can control the family-wise error rate (FWER) using, e.g. the method 
of Holm (1979). 


3 Power of the Grace Test 

In this section, we investigate power properties of the Grace test. Our first result 
describes sufficient conditions for detection of nonzero coefficients. 

Theorem 3.1. Assume Assumptions AO - A4 are met. If for some h, some 0 < a < 
1, 0 < '^ < 1, conditional on X, we have 


\l3*\ > 2F^ -I- qi^i_a/2)\jHax{Z^\X) + (13) 

where <h (g(i_a/ 2 )) = 1 — Oi/2. Then using the same tuning parameter h in the Grace 
test, we get lim„^oo P'f' {Pf < aj A) Pip. 

Having established the sufficient conditions for detection of non-null hypotheses 
in Theorem 3.1, we next turn to comparing the power of the Grace test with its 
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competitors: the Grace test, the ridge test with small tuning parameters /i 2 = 0{1) 
and no bias correction, and the Gracel test, which is the Grace test with identity 
penalty weight matrix I. The ridge test may be considered as a variant of the test 
proposed in Biihlmann (2013) without the adjustment of the projection bias - because 
we assume the design matrix is random, we incur no projection bias in the estimation 
procedure. 

As indicated in Lemma 2.1, the estimation bias of the Grace procedure depends on 
the informativeness of the penalty weight matrix L. When L is informative, we are 
able to increase the size of the tuning parameter, which shrinks the estimation variance 
without inducing a large estimation bias. Thus, with an informative L, we are able 
to obtain a better prediction performance, as shown empirically in Li and Li (2008); 
Slawski et ah (2010); Li and Li (2010). In such setting, the larger value of the tuning 
parameter, e.g. as chosen by GV, also results in improved testing power, as discussed 
next. 

Theorem 3.2 compares the power of the Grace test to its competitors in a simple 
setting of p = 2 predictors, Xi and X 2 . In particular, this result identihes sufficient 
conditions under which the Grace test has asymptotically superior power. It also gives 
conditions for the Gracel test to have higher power than the ridge test. The setting 
of p = 2 predictors is considered mainly for ease of calculations, as in this case, we 
can directly derive closed form expressions of the corresponding test statistics. Similar 
results are expected to hold for p > 2 predictors, but require additional derivations 
and notations. 

Assume y = Xi/3l +X2(32 where e ~ iV2(0, Ug/), and Xi, X 2 are scaled. Denote 



Theorem 3.2 considers the power for testing the null hypothesis Hq : (3\ = 0, in settings 
where (31 ^ 0, without any constraints on 

Theorem 3.2. Suppose Assumptions AO - A4 are met. Let and 
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be the Grace, Gracel and ridge p-values, respectively, with tuning parameters h^ 
for Grace and h^^ for Gracel. Define 

T 

(14) 

Then, conditional on the design matrix X, under the alternative hypothesis (31 = h 0, 
the following statements hold with probability tending to 1, as n ^ oo. 

a) If lim Tp^n{hn,l,P,\b\) > lim 0,p, |6|), then lim [Pf < 

71^00 n^oo n^oo 

1 . 

b) If lim |6|) > ^1- p^\b\, then lim [Pf (/i);)/Pf] < 1. 

n^oo n^oo 

c) If lim Tp^n{h^^,0,P, \b\) > a/1 - p^ \b\, then lim [Pi\h^^)/Pf^] < 1. 

n^oo n^oo 

Theorem 3.2 indicates that, as h^/n and h^^/n diverge to inhnity, both Tp^n{h^, I, p, l/dil) 
and Tp^ri{h^^,0, p, |/3J‘|) approach inhnity. This implies, on one hand, that for h^ and 
h^^ sufficiently large, both the Grace and Gracel tests are asymptotically more power¬ 
ful than the ridge test. On the other hand, we can only compare the powers of the Grace 
and Gracel tests under some constraints on their tuning parameters. With equal tuning 
parameters for Grace and Gracel, h^ = hfl^, we can show, after some algebra, that as 
h^/n = h^^/n -)■ cx), we have lim„^oo /, p, 1/3^1) > lim„^oo 0, p, 1/3^1) 

if (1 - n > a/(1 + /^ — 2/p). In this case, the Grace test is more powerful than the 
Gracel test if I is between 0 and I*, where I* is the unique root in [—1,1] of the cubic 
equation — 31 + 2p = 0. Figure 1(a) compares the powers of the Grace and Gracel 
tests with equal tuning parameters h^/n = h^^/n = 10 and /3l = 1. It can be seen 
that, the Grace test asymptotically outperforms the Gracel test when I is close to 
p with equally large tuning parameters. However, when / is far from p, the Gracel 
test could be more powerful. This observation, and the empirical results in Section 5 
motivate the development of the GraceR test, introduced in Section 4. 

A similar comparison for powers of the Grace and the ridge test, with h^/n = 10 and 
/dj' = 1, is provided in Figure 1(b). These results suggest that, with large Grace tuning 


^ [{h/n + lf - jp + lh/nf] • |A| - [log^ • \ {l - p)h/n 
v^(l + 2Vn)(l - p2) + {h/ny{l + P- 2/p) 
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parameters, Grace substantially outperforms the ridge test in almost all scenarios. The 
result for the Grace and ridge comparison is similar with h^/n = 1. 

4 The Grace-Ridge (GraceR) Test 

As discussed in Section 2, an informative L results in reduced bias of the Grace proce¬ 
dure, by choosing a larger tuning parameter h. The result in Theorem 3.2 goes beyond 
just the bias of the Grace procedure. It shows that for certain choices of L, i.e. when 
I is close to the true correlation parameter p, the Grace test can have asymptotically 
superior power. This additional insight is obtained by accounting for, not just the bias 
of the Grace procedure, but also its variance, when investigating the power. 

However, in practice, there is no guarantee that existing network information truly 
corresponds to similarities among coefficients, or is complete and accurate. To address 
this issue, we introduce the Grace-ridge (GraceR) test. The estimator used in GraceR 
incorporates two Grace-type penalties induced by L and I: 

0{hG, ^ 2 ) = argmin |||y - + hcjS^LfS + h2(3~^(3^ = {n'S + hGL + h 2 l) ^X^y. 

^ (15) 

Using data-adaptive choices of tuning parameters Hg and ^ 2 , we expect this test to be 
as powerful as the Grace test if L is informative, and as powerful as the Gracel test, 
otherwise. 

Another advantage of the GraceR over the Grace test is improved bias-variance 
tradeoff. If L is (almost) singular, the variance of the Grace test statistic, which 
depends on the eigenvalues of (nS -|- hL), could be large even for reasonably large h. 
Thus, even though our discussion in Section 2.1 shows that {n'S + hL) is almost surely 
invertible, with hnite samples, its smallest eigenvalue could be very small, if not zero. 
If L is informative, L(3 and hence the bias in (4) are small. Thus, the rank-dehciency 
of (nS -|- hL) can be alleviated by choosing a large value of h. However, if T/3 is non- 
negligible, choosing a large value of h may result in a large bias, even larger than the 
ridge estimate, to the extent which may offset the beneht from the variance reduction. 
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The finite sample type-I error rate of the Grace test may thus be controlled poorly. 
By incorporating an additional G penalty, we can better control the eigenvalues and 
achieve a better bias-variance trade-off. 

The GraceR optimization problem leads to the following test statistic: 

= /3(hG, ha) + (nS + hcL + h 2 l)-\hGL + ha/)/?- (16) 


Similar to Section 2.2, we can write 


zGR 




+ 


3 = 1,--,P, 


(17) 


where 


Zf^\X ~ N ( 0,na, 


^ (nS + hcL + h 2 l)-\hGL + ha/)(/3 - /?)• 


I (nS + HgL + ha/)~^S(nS + + ha/)"' 

- 1 / 




Similar to the Grace test in in Section 2.2, we choose /3 to be an initial lasso estima¬ 
tor, and derive an asymptotic stochastic bound for such that |y^^| 

Equation (12) is again used to obtain two-sided p-values for Hq. Theorems 4.1 and 
4.2 parallel the previous results for the Grace test, and establish GraceR’s asymptotic 
control of type-I error rate, and conditions for detection of non-null hypotheses. Proofs 
of these results are similar to Theorems 2.3 and 3.1, and are hence omitted. We first 
state an alternative to Assumption A4. This assumption can be justified using an 
argument similar to that for Assumption A4, and can also be relaxed with the cost of 
reduced power for the GraceR test. 

• A4’: he, ^2 and L are such that 


(nS + HgL + h 2 iy\hGL + h 2 l) 


ihj) 



Theorem 4.1. Assume Assumptions A1 - A3 and A4’ are met. The following 
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satisfies the stochastie bound for GraeeR. 

T^^^\[{ni: + hGL + h2l)-\hGL + h2l)\^^_.^\^ (^)' 

Then, under the null hypothesis, for any a > 0, 

lim sup Pr (| zf^ | > «) < lim sup Pr (| Zf^ \ + > a) . (19) 

n^oo n^oo 

Theorem 4.2. Assume Assumptions A1 - A3 and A4’ are met. If for some /ig > 0 
and /i 2 > 0, conditional on X, we have 



for some 0 < a < 1 and 0 < fj < 1. Then using the same hG and /i 2 in the GraeeR 
test, we get lim„_^oo Pr {Pf^ < ct [a) > fj. 

5 Simulation Experiments 

In this section, we compare the Grace and GraeeR tests with the ridge test (Biihlmann, 
2013) with small tuning parameters, low-dimensional projection estimator (LDPE) for 
inference (Zhang and Zhang, 2014; van de Geer et ah, 2014) and the Gracel test. To 
this end, we consider a graph similar to Li and Li (2008), with 50 hub covariates (genes), 
each connected to 9 other satellite covariates (genes). The 9 satellite covariates are 
not connected with each other, nor are covariates in different hub-satellite clusters. In 
total the graph includes p = 500 covariates and 450 edges; see Figure SI in Section 8 
for an illustration with 5 hub-satellite clusters. We build the underlying true Laplacian 
matrix L* according to the graph with all edge weights equal 1. 

To assess the effect of inaccurate or incomplete network information, we also con¬ 
sider variants of the Grace and GraeeR tests with incorrectly specihed graphs, where 
a number of randomly selected edges are added or removed. The number of removed 
or added (perturbed) edges relative to the true graph is NPE G {-165, -70, -10, 0, 
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15, 135, 350}, with negative and positive numbers indicating removals and additions 
of edges, respectively. For example, NPE=-165 indicates 165 of the 450 edges in the 
true graph represented by L* are randomly removed in the perturbed graph with cor¬ 
responding perturbed Laplacian matrix L. This represents the case with incomplete 
network information. On the other hands, NPE = 350 indicates that in addition to the 
450 true edges in L*, we also randomly add 350 wrong edges to L. The NPE values con¬ 
sidered correspond to similar normalized spectral differences for settings where edges 
are removed or added, i.e. \\L - T*|| 2 /||T *||2 ~ (0.75,0.50,0.25,0,0.25,0.50,0.75). 
Thus, the size of perturbation to the graph is roughly the same with NPE = -165 and 
350. The perturbed penalty weight matrix L is then used in the Grace and GraceR 
tests. Since {X"' X + hL) may not be invertible, for Grace, we add a value of 0.01 to 
the diagonal entries of L to make it positive dehnite. No such correction is needed for 
GraceR and Gracel because of the £2 penalty. 

In each simulation replicate, we generate n = 100 independent samples, where for 
the 50 hub covariates in each sample, iV(0,1), k = 1,...,50, and for the 9 

satellite covariates in the k-th hub-satellite cluster, iV(0.9 x a;^“^,0.9), I = 

1,..., 9, /c = 1,..., 50. This is equivalent to simulating Np{0, S) for i = 1,..., 100 

with S = {L* + 0.11 X I)~^, where L* corresponds to the partial covariance structure 
of the covariates. 

We consider a sparse model in which covariates in the hrst hub-satellite cluster 
are equally associated with the outcome, and those in the other 49 clusters are not. 
Specihcally, we let 

/3* = 

10 P-10 

We then simulate y = X/3* -|-e, with e ~ Wi(0, a^/n), and consider ae G {9.5, 6.3,4.8} 
to produce expected = 1 — a1/Yax{y) G (0.1, 0.2, 0.3}. 

Throughout the simulation iterations, L* and (3* are kept hxed, and L, X and e 
are randomly generated in each repetition. We set the sparsity parameter ^ = 0.05, 
and hiasso = 4(TeA/3 logp/n, where de is calculated using the scaled lasso (Sun and 
Zhang, 2012). As suggested in Biihlmann (2013), the tuning parameter for the ridge 
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test is set to 1. Tuning parameters for LDPE, Grace, GraceR and Gracel are chosen 
by 10-fold GV. We use two-sided significance level a = 0.05 and calculate the average 
and standard error of powers from 10 non-zero coefficients and the type-I error rates of 
each test from 490 zero coefficients. Figure 2 summarizes the mean powers and type-I 
error rates of tests across B = 100 simulated data sets, along with the corresponding 
95% conhdence intervals. Detail values of powers and type-I error rates, as well as an 
expanded simulation with a larger range of NPE, are available in Section 8. 

Gomparing the power of the tests, it can be seen that the Grace test with correct 
choices of L (NPE = 0) resnlts in highest power. The performance of the Grace test, 
however, deteriorates as L becomes less accnrate. The performance of the GraceR test 
is, on the other hand, more stable. It is close to the Grace test when the observed L 
is close to the trnth, and is roughly as good as the Gracel test when L is signihcantly 
inaccnrate. As expected, our testing procednres asymptotically control the type-I error 
rate, in that observed type-I error rates are not signihcantly different from a = 0.05. 

6 Analysis of TCGA Prostate Cancer Data 

We examine the Grace and GraceR tests on a prostate adenocarcinoma dataset from 
The Gancer Genome Atlas (TGGA) collected from prostate tumor biopsies. After 
removing samples with missing measurements, we obtain a dataset with n = 321 
samples. For each sample, the prostate-specihc antigen (PSA) level and the RNA 
sequences of 4739 genes are available. Genetic network information for these genes is 
obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG), resulting in 
a dataset with p = 3450 genes and \E\ = 38541 edges. 

We center the ontcome and center and scale the covariates. For the Grace and 
GraceR tests, we set the sparsity parameter ^ = 0.05 and hiasso = 4(Je a/ 3 logp/n, 
where dg is calculated using the scaled lasso (Sun and Zhang, 2012). We control the 
false discovery rate at a = 0.05 level using the method of Benjamini and Yekntieli 
( 2001 ). 

To increase the chance of selecting “hnb” genes, we use the normalized Laplacian 
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matrix where D is the diagonal degree matrix for the KEGG 

network with edge weights set to 1. The Grace penalty induced by the normalized 
Laplacian matrix encourages smoothness of coefficient estimates based on the degrees 
of respective nodes, \/^Yw{u, v) (Li and Li, 2008). 

We add 0.001 to the diagonal entries of to induce positive dehnitiveness in the 

Grace test. 

As shown in Figure 3(a), the Grace test with tuning parameter selected by 10- 
fold GV identihes 54 genes that are associated with PSA level. They consist of 42 
histone genes, 11 histone deacetylase (HDAG) genes and the paired box gene 8 (PAX8). 
Histone and HDAG genes are densely connected in the KEGG network. With the 
network smoothing penalty, the Grace regression coefficients of histone and HDAG 
genes are all positive with a similar magnitude. Existing literature indicates that 
the histone and HDAG genes are associated with the occurrence, progression, clinical 
outcomes or recurrence of prostate cancer. Figure 3(b) shows the result for the GraceR 
test. GraceR identihes 5 histone genes, which are also identihed by the Grace test. In 
addition, GraceR identihes 11 genes that are not identihed by Grace. Prior work has 
identihed 9 of those 11 genes to be associated with PSA level or the severity and stage 
of cancer. Additional details about existing evidence in support of genes identihed 
using Grace and GraceR tests, as well as extended results on prediction performance 
and stability of the Grace test are provided in Section 8. 

As a comparison, the Gracel test with 10-fold GV identihes 16 disconnected genes, 
11 of them are also identihed by the GraceR test. Ridge test (Buhlmann, 2013) with 
tuning parameter ^2 = 1 identihes 4 disconnected genes, which are also identihed 
by the GraceR test. The low-dimensional projection estimator (LDPE) with tuning 
parameters chosen by 10-fold GV identihes 10 disconnected genes. Seven of these genes 
are identihed by GraceR and two by Grace. 
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7 Discussion 


In this paper, we proposed the Grace and GraceR tests that incorporate external 
graphical information regarding the similarity between covariates. Such external in¬ 
formation is presented in the form of a penalty weight matrix L, which is considered 
to be the (normalized) graph Laplacian matrix in this paper. However, any positive 
semi-dehnite matrix can be used as L. The proposed inference framework thus al¬ 
lows researchers in different helds to incorporate relevant external information through 
L. For example, we can use various distance and kernel metrics that measure the 
(dis)similarity between species in phylogenetic studies. We can also use the adaptive 
graph Laplacian matrix (Li and Li, 2010) so that coefficients of negatively correlated 
covariates are penalized to have the opposite signs. Regardless of the choice of R, 
our proposed procedures asymptotically control the type-I error rate; the power of the 
Grace test, however, depends on the informativeness of L. The power of the GraceR 
test is on the other hand less dependent on the choice of L. 

The Grace test introduced in this paper is not scale invariant. That is, the Grace 
test with the same tuning parameter could produce different p-values with data (X, y) 
and (X, ky), where /c 7 ^ 1 is a constant. This is clear as the test statistic Zj depends 
on y whereas the stochastic bound T^ does not. To make the Grace and GraceR 
tests scale invariant, we can simply choose the tuning parameter for our lasso initial 
estimator to be hiasso = Ca^^/\ogp/n with a constant C > 2\/2. Sun and Zhang 
(2012) show that the lasso is scale invariant in this case. We would also need to use 
scaled invariant stochastic bounds T^ — o'eTf and Tf'^ = (JeT^^ in our Grace and 
GraceR tests. Note that multiplying any constant in T^ and T^'^does not change our 
asymptotic control of the type-I error rate. 

In this paper, cross validation (GV) is used to choose tuning parameters of the 
Grace and GraceR tests. However, GV does not directly maximize the power of these 
tests. Selection of tuning parameters for optimal testing performance can be a fruitful 
direction of future research. Another useful extension of the proposed framework is its 
adaptation to generalized linear models (GLM). 
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8 Supplementary Materials 

8.1 Proof of Lemma 2.1 

Proof. Given that (nS + hL) is invertible and h > 0, we have 

Bias(/3(h)|X) = E{0{h)\X) - f3* 

= (nS + hL)-^n±l3* - (nS + hLy\n± + hL)f3* 

= -{n± + hL)-^hLl3*, 

which is equal to 0 if and only if Lf3* = 0. We know that 

(nS + hL)-^ ^ - 1. 

Ao(nS + hL) 

Therefore, 

||Bias(/3(h))|x||2 = h\J {L(3*)^ (nS + hL)-^{L(3*) 

< h 

h\\Ln 
Xo{n'S + hL) 

□ 
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8.2 Proof of Theorem 2.3 


Proof. Under the null hypothesis Hq : /3* = 0, we have 


O-fl =ft|(nE + fei)-‘i(/3-/3-)|j 

i=l 

< h\ [(nS + - A-)| + ft I [(nE + 

< ft||[(nE + ftX)-'i;],^ _.,|P||/3 + A| [(nE + fti)-'i],,,,ft 


Based on Biihlmann and van de Geer (2011), Chapter 6.12, with Gaussian design, if 
the S-compatibility condition is met for the set So with compatibility constant with 
probability tending to 1, the condition is also met for S with compatibility constant 
0s > 0s/2. Moroever, with hLasso ^ A/logp/n and the S-compatibility condition for 
the set S'o, with probability tending to 1, we have 




Then, because Sq = c>([n/logp]^) and liminf > d/2 > 0, we get 

On the other hand, by Assumption A4, ((nS + hL)~^hL'^ = (2p((n/logp)^/^“^). 

Thus 


ft | [(nS + ftX)-‘X]yj,ft| = I [(nS + ftX)-‘ftX]yj,||ft - P’] = o,(l), 
and hence 

Pr (hfl < ft||[(nS + ftX)-'X]„,_„|L(!^)^-«) ^ 1, 
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where the right hand side is F^. We can thus write 


< hf I + bf I 

^asy. I ^G\ -pG 


□ 


8.3 Proof of Theorem 3.1 

Proof. Given (12), conditional on X, the objective of < a is satished if lv°l > 
+ g(i_Q,/ 2 )'y/Var(Zj^|X). According to Equation (6), this is equivalent of \ j3* + + 

'lf\ > + <?(i-«/ 2 ) Y^Var(Zj^|X), which is satisfied if 

|/3-| - |7f I - |zf [ > r° + «(i-„/2idV”(2°|V). 

This holds with probability at least ip if 

l^i I ~ + Q'(i-a/2)^Var(Zj^|X) + q(i_.^i2). 

We know that with probability tending to 1, \lf\ < F^. Therefore, conditional on 
we have P^ < ctl with probability tending to at least tp, if 

\P*\ > 2F^ + g(i-a/2)'y/Var(Zj^|X) + 


□ 


8.4 Proof of Theorem 3.2 

Proof, a) We note that Pf^ / Pf'^ < 1 is equivalent of 


(1 

.^GI 

-rf')+/Aai'(zF'|x) 

( 


-rf)pyvar(zfix) 
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We first write out those components for the Grace test: 




r 


G 

1 


Var(Zf|X) 


{{X^X + h^L)-\X^y + h^LP))^ 

(n + h^)xjy - (np + h^l)x^y + h^Pi{n + - npl - h^f) + nh^j32{l - p) 


(n + h^f - {np + h^iy 




/ logP 

V n 




h^[{X^X + h<iL)-^L\ 


( 1 , 2 ) 


/ logp \ " 


h-^ 


\nh^l - nh^p\ [ logp\ ^ 




(n + h^Y - {np + h^iy \ n J 


al [{X^X + h^^Ly^X^X{X^X + 

2 {rY + 2hgn^)(l - p^) + n(hg)^(l + /^ - 2/p) 

[(n + h«)2-(np + hG/)2]2 


We can also write out those components for the Gracel test likewise with I = 0. 

In the proof of Theorem 2.3, we have shown that Pr ^||/3 — /3*||^ < 4/iLassoSo/0|,j —t 
1. With hiasso = 0{\ogp/n), sq = G([n/logp]^) for some 0 < .^ < 1/2, liminf^^ > 
d/2 > 0, and p = C)(exp(n'^)) for some 0 < z/ < 1, we have \\/3 — f3\\i = Gp(l). Thus 
we get 

idl= Yl + Gp(l), jd2= 1^2 + Gp(l). 

We also note that since our design matrix is scaled, we get 


x~ly = x~l XiYl + *7 *2/92 + xle = n/3l + np/d^ + nE, 
xjy = xjxi/dl + xjx2f32 + xje = npfdl + n/d^ + nE^ 


where E ^ N {0, cj\/n) = Gp(l). 

Dehne k// — h/l/n and /n. With some algebra. We get 

(Izfl - + 1)^ - (P + lk°f + OrW \■ 1/3;I - (logp/n)’/^-^ ■ |<:g(/-p)|]^ 

VVar(ZflJt) “ ff,V(1 + 2*:f)(l - P^) + (*:.?)"(1 + P- 2lp) 

(21) 
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Similarly for the Gracel, we get 


(I^PI -rn+ _ v^0(^n^ + l)^-P^ + ^p(l)l • 1/^11 - {\ogp/nf/‘^ IfcgVl]^ 

v/Var(Zf^|X) “ a,^(1 + 2A;G^)(1 - p^) + (A;G/)2 ' 

( 22 ) 


We observe that + 1 > 1 > |p| and + 1 > \l\k^ + \p\ > \p + lk^\. We plug in 
those two inequalities into Equation (21) and (22). Hence, conditional on the design 
matrix X, < 1 with probability tending to 1 if 


lim 

n^oo 


> lim 

n^oo 


{[(*:° + 1)^ - (P + ik°f\ ■ Ift'l - (logp/ii)*-'^ ^ ■ \>=SV - P)\}. 

y(i+ 2kO)(i - p^) + (ksni + p^W) 

{ [(*;?' + 1)“ - pT ■ Ifti - (logp/n)‘/2-« • |P“piy 

y(i + 2P«)(i-p2) + (PG/)2 ■ 


Note that for any two real numbers / and g, f > g implies /+ > p+. Thus, 
conditional on the design matrix X, P^/P^^ < 1 with probability tending to 1 if 


lim 

n^oo 


> lim 

n^oo 


PS + 1)^ - (fi + II^Sf] ' Iff I - (logp/n)’-''^ ^ ■ |fcg(i -p)| 
v/(l + 2kS)(l - p^) + (kOfil + P- 2lp) 

PS' + 1)^ - P^] ■ lft*l - {iogp/ny/^-i ■ IPgVI 

P(i+2kS‘){i-fP + {kS'r 


(23) 


If we assume k^ = k^^ = /c —)■ cx). Inequality (23) is satished if 


lim 

n^oo 


{k + 1)^ - (p + Ik)'^] • |/9i| - (logp/n)^/^ ^ • \k{l - p)| 


X 


{k + 1)2 - p2] ■ |/3i I - (logp/n)H2-^ . \kp\ 

v/(l + 2fc)(l-p2)+F 

v/(l + 2A;)(1 - p2) + p2(i + /2 _ 2/p) 

[(1 - /2) + (2 - 2lp)/k + (1 - P^)A^] ■ l/^il - (logp/n)i/2-€ . !(/ _ 


= lim 


X 


1 + 2/k + (1 - p‘^)/k‘^] ■ 1/3^1 - (logp/n)H2-« . \p/k\ 

V^l + (2 - 2p^)/k + (1 - p2)/A;2 


v/(l + /2 - 2/p) + (2 - 2p2)/A; + (1 - p2)/p2 
(1-/2) 


a/( 1 + /2 — 2/p) 


> 1 . 


(24) 
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The last equality holds because p = (9(exp(n'^)) for some 0 < z/ < 1 implies that 
logp/n —)■ 0. 

For the ridge test, we assume = 0{1). Thus with some algebra we can similarly 
write out the ridge test objective: 


l^fl ^ Vn\l - + Op{l)\ ■ |/j*| 

v/Var(Zf|X) {I - p^) + o{l) 


(25) 


b) Thus, conditional on X, we get < 1 with probability tending to 1 if 


lim 


((fen + 1)^ - + • 1/^11 - (logp/n)^/^ g ■ \k^{l- p)\ ^ 

p{l + 2kS)il-^) + ikOf(l + P-2lp) 


(26) 


c) We also havePj^'^/P/^ < 1 with probability tending to 1 if 


lim 

n^OQ 


((fen^ + 1)^ - P^) • \/3*l\ - (logp/n)^/^ ^ ■ |/cgV| ^ 

V0T2kp){ikrp)TJkpf 


(27) 


□ 


8.5 Illustration of the Graph Structure in the Simulation Study 

Figure 4 shows the graph structure used in the simulation study with 5 hub-satellite 
clusters. In the simulation study, we use 50 such hub-satellite clusters. 

8.6 Additional Details for Analysis of TCGA Data 

8.6.1 Biological Evidence 

In this section, we summarize some of the biological evidences in support of the as¬ 
sociation between genes identihed by the Grace and GraceR tests with the onset, 
progression and severity of prostate cancer, as well as PSA level. 

As pointed out in the main paper, the Grace and GraceR tests identify a number of 
histone genes and histone deacetylase (HDAG) genes. Previous research indicates that 
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histone genes are associated with the occurrence, clinical outcomes and recurrence of 
prostate cancer (Seligson et ah, 2005; Ke et ah, 2009). The pathological role of HDAC 
genes on the onset and progression of prostate cancer have also been previously studied 
(Halkidou et ah, 2004; Chen et ah, 2007; Abbas and Gupta, 2008). 

In addition to the highly connected histone and HDAC genes, the GraceR test 
also identihes some disconnected genes. Prior works shows that the expression of 
ribonucleoside-diphosphate reductase subunit M2 (RRM2) is associated with higher 
Gleason scores, which correlate with the severity of prostate cancer (Huang et ah, 
2014). Protein arginine methyltransferase 1 (PRMTl) may also have an effect on the 
proliferation of prostate cancer cells (Yu et ah, 2009). Activation of olfactory receptors 
(OR) prevents proliferation of prostate cancer cells (Neuhaus et ah, 2009). Interferon-y 
(IFNG) plays a role in the differentiation of human prostate basal-epithelial cells (Un- 
tergasser et ah, 2005). IFNG is connected to the interleukin receptor 22 al (IL22RA1), 
the role of which related to prostate cancer is unknown. However, several earlier stud¬ 
ies point out the associations between prostate cancer and several other interleukin 
receptors in the Janus kinase and signal transducer and activator of transcription 
(JAK-STAT) activating family, including IL 6, 8, 11, 13 and 17 genes(Gulig et ah, 
2005; Inoue et ah, 2000; Gampbell et ah, 2001; Maini et ah, 1997; Zhang et ah, 2012). 
Gell-division cycle genes (GDG) may also be associated with various cancers. The 
association between collagen type 2 al (GOL2A1) and prostate cancer is also not 
known, but other collagen genes, including type 1 a2/31, type 4 q; 5 and a6, have been 
shown to be associated with prostate cancer progression (Hall et ah, 2008; Dehan et ah, 
1997). Although the association between phosphate cytidylyltransferase 1 choline-o; 
(PGYTIA) and prostate cancer or PSA level is not known, Vaezi et al. (2014) shows 
that PGYTIA is a prognostic factor in survival for patients with lung and head and 
neck squamous cell carcinomas. 

8.6.2 Stability of the Grace Test to the Tuning Parameter 

Figure 5 shows the number of significant genes identihed by the Grace test in the TGGA 
data against various values of he- The results indicate that the number of genes found 
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by the Grace test is relatively stable for a range of tuning parameters including the CV 
choice. On the other hand, very few genes are identihed when the tuning parameter 
is too small or too large. This is because, with small tuning parameters, the variance 
is large and thus no gene is statistically signihcant. On the other hand, with large 
tuning parameters, the stochastic bound T^ dominates Zj. Note that above results 
of power do not contradict Theorem 3.2, which shows the asymptotic power of the 
Grace test improves as we use larger ho- A vital condition for Theorem 3.2 to hold is 
11/3-/3||i = Opil). 

8.6.3 Stability of the Grace Test to the Network 

We examine whether the result of the Grace test on the TGGA data is sensitive to 
the KEGG network structure. To this end, we randomly change the connectivity of m 
node pairs in the KEGG network and form the new perturbed network G, \E/S.E\ = m, 
where A is the symmetric difference operator between two sets. In other words, for 
m randomly selected node pairs {ai,hi), i = l,...,m, if there is an edge {ai,bi) in the 
KEGG network, we remove it in the perturbed network; otherwise, we add an edge in 
the perturbed network. In our examination, m ranges from 10, 000 to 600, 000. Note 
that there are 38,541 edges in the original KEGG network. We counted the number 
of genes that are signihcant using both networks. The result shown in Figure 6 is an 
average of 50 independent replications. 

8.6.4 Prediction Performance 

We also compare the prediction performance by Grace, GraceR, Gracel and lasso with 
tuning parameters chosen by 10-fold GV, as well as ridge with /i 2 = 1. The result is 
shown in Table 1. GraceR produced the smallest GV prediction error, followed closely 
by Gracel and Grace. This result may indicate the KEGG network information is in 
fact informative in prediction. 



Table 1: Prediction performance of the Grace, GraceR, Gracel(ridge regression with tuning pa¬ 
rameter chosen by GV), ridge (^2 = 1) and lasso. The performance metric is the sum of 10-fold 
GV prediction error (GVER). 


Grace 

GraceR 

Gracel 

Ridge 

Lasso 

GVER 3473 

3411 

3418 

3917 

3546 


8.7 Additional Simulation Studies with Extended NPE 

We performed simulation studies with extended NPE G {-225, -165, -70, -10, 0, 15, 
135, 350, 600, 900, 1250, 1650, 2050, 3150}. These perturbations in the network 
correspond to the spectral norm of perturbations \\L — T*|| 2 /||T *||2 equal 0.85, 0.75, 
0.50, 0.25, 0, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 2.00 and 2.65, respectively. The 
power and type-I error rates are summarized in Figure 7, Table 2 and Table 3. Our 
conclusions on the simulation study stated in the main paper do not change with this 
expanded version of simulation study. 
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Table 2: Mean power and the standard error for the LDPE test, ridge test, Gracel, Grace and 
GraceR tests with different values. 


LDPE 

Ridge 

Gracel 

Grace NPE = -225 
Grace NPE = -165 
Grace NPE = -70 
Grace NPE = -10 
Grace NPE = 0 
Grace NPE =15 
Grace NPE =135 
Grace NPE = 350 
Grace NPE = 600 
Grace NPE = 900 
Grace NPE = 1250 
Grace NPE = 1650 
Grace NPE = 2050 
Grace NPE = 3150 
GraceR NPE = -225 
GraceR NPE = -165 
GraceR NPE = -70 
GraceR NPE = -10 
GraceR NPE = 0 
GraceR NPE = 15 
GraceR NPE = 135 
GraceR NPE = 350 
GraceR NPE = 600 
GraceR NPE = 900 
GraceR NPE = 1250 
GraceR NPE = 1650 
GraceR NPE = 2050 
GraceR NPE = 3150 


= 0.1 


0.181 

(0.011) 

0.274 

0.220 

(0.016) 

0.393 

0.493 

(0.026) 

0.769 

0.623 

(0.033) 

0.853 

0.720 

(0.032) 

0.918 

0.780 

(0.035) 

0.974 

0.839 

(0.035) 

0.986 

0.813 

(0.039) 

1.000 

0.760 

(0.042) 

0.947 

0.506 

(0.047) 

0.791 

0.431 

(0.045) 

0.732 

0.328 

(0.040) 

0.719 

0.337 

(0.037) 

0.609 

0.316 

(0.036) 

0.672 

0.376 

(0.040) 

0.688 

0.252 

(0.037) 

0.558 

0.312 

(0.037) 

0.622 

0.547 

(0.033) 

0.790 

0.606 

(0.032) 

0.831 

0.650 

(0.032) 

0.872 

0.722 

(0.034) 

0.904 

0.682 

(0.038) 

0.901 

0.702 

(0.035) 

0.887 

0.631 

(0.037) 

0.882 

0.628 

(0.036) 

0.878 

0.539 

(0.036) 

0.785 

0.490 

(0.033) 

0.781 

0.515 

(0.031) 

0.822 

0.585 

(0.032) 

0.821 

0.450 

(0.034) 

0.748 

0.442 

(0.036) 

0.767 


= 0.2 ^ Q 3 

(0.012) 0.343 (0.014) 

(0.018) 0.580 (0.019) 

(0.021) 0.868 (0.015) 

(0.018) 0.918 (0.011) 

(0.012) 0.959 (0.007) 

(0.005) 0.985 (0.004) 

(0.010) 0.998 (0.001) 

( 0 . 000 ) 1.000 ( 0 . 000 ) 

(0.022) 0.989 (0.010) 

(0.038) 0.920 (0.023) 

(0.041) 0.873 (0.031) 

(0.037) 0.906 (0.024) 

(0.041) 0.791 (0.032) 

(0.038) 0.911 (0.017) 

(0.037) 0.859 (0.025) 

(0.042) 0.792 (0.032) 

(0.038) 0.845 (0.024) 

(0.023) 0.882 (0.015) 

(0.018) 0.923 (0.012) 

(0.018) 0.925 (0.013) 

(0.019) 0.959 (0.011) 

(0.020) 0.928 (0.017) 

(0.023) 0.958 (0.011) 

(0.025) 0.957 (0.013) 

(0.018) 0.940 (0.013) 

(0.028) 0.905 (0.017) 

(0.024) 0.875 (0.016) 

(0.022) 0.909 (0.013) 

(0.022) 0.890 (0.016) 

(0.028) 0.876 (0.017) 

(0.025) 0.864 (0.017) 
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Table 3: Mean type-I error rate and the standard error for the LDPE test, ridge test, Gracel, 
Grace and GraceR tests with different values. 


LDPE 

Ridge 

Gracel 

Grace NPE = -225 
Grace NPE = -165 
Grace NPE = -70 
Grace NPE = -10 
Grace NPE = 0 
Grace NPE = 15 
Grace NPE = 135 
Grace NPE = 350 
Grace NPE = 600 
Grace NPE = 900 
Grace NPE = 1250 
Grace NPE = 1650 
Grace NPE = 2050 
Grace NPE = 3150 
GraceR NPE = -225 
GraceR NPE = -165 
GraceR NPE = -70 
GraceR NPE = -10 
GraceR NPE = 0 
GraceR NPE = 15 
GraceR NPE = 135 
GraceR NPE = 350 
GraceR NPE = 600 
GraceR NPE = 900 
GraceR NPE = 1250 
GraceR NPE = 1650 
GraceR NPE = 2050 
GraceR NPE = 3150 


= 0.1 


0.048 (0.0010) 

0.048 

0.046 (0.0012) 

0.048 

0.031 (0.0010) 

0.027 

0.026 (0.0013) 

0.021 

0.025 (0.0014) 

0.020 

0.027 (0.0021) 

0.019 

0.022 (0.0021) 

0.015 

0.024 (0.0021) 

0.017 

0.032 (0.0034) 

0.031 

0.040 (0.0073) 

0.037 

0.059 (0.0137) 

0.051 

0.060 (0.0156) 

0.059 

0.041 (0.0115) 

0.038 

0.052 (0.0151) 

0.045 

0.044 (0.0141) 

0.045 

0.039 (0.0141) 

0.035 

0.039 (0.0110) 

0.027 

0.027 (0.0012) 

0.023 

0.028 (0.0013) 

0.023 

0.028 (0.0014) 

0.022 

0.026 (0.0018) 

0.020 

0.027 (0.0018) 

0.022 

0.030 (0.0025) 

0.026 

0.058 (0.0165) 

0.041 

0.076 (0.0182) 

0.059 

0.058 (0.0145) 

0.054 

0.044 (0.0109) 

0.040 

0.057 (0.0125) 

0.044 

0.053 (0.0138) 

0.047 

0.045 (0.0111) 

0.033 

0.039 (0.0053) 

0.029 


= 0.2 ^ Q 3 

(0.0010) 0.047 (0.0010) 

(0.0013) 0.050 (0.0012) 
(0.0009) 0.025 (0.0008) 

(0.0012) 0.019 (0.0010) 

(0.0013) 0.017 (0.0012) 

(0.0017) 0.014 (0.0013) 

(0.0017) 0.013 (0.0015) 

(0.0017) 0.011 (0.0013) 

(0.0031) 0.028 (0.0028) 

(0.0059) 0.029 (0.0042) 

(0.0102) 0.036 (0.0052) 

(0.0155) 0.040 (0.0083) 

(0.0101) 0.027 (0.0033) 

(0.0111) 0.037 (0.0075) 

(0.0125) 0.038 (0.0104) 

(0.0112) 0.027 (0.0023) 

(0.0024) 0.026 (0.0015) 

(0.0011) 0.020 (0.0009) 

(0.0011) 0.019 (0.0010) 

(0.0014) 0.018 (0.0012) 

(0.0015) 0.017 (0.0014) 

(0.0016) 0.015 (0.0013) 

(0.0025) 0.021 (0.0025) 

(0.0112) 0.038 (0.0103) 

(0.0152) 0.030 (0.0027) 

(0.0139) 0.027 (0.0016) 

(0.0099) 0.025 (0.0010) 

(0.0100) 0.034 (0.0071) 

(0.0122) 0.039 (0.0104) 

(0.0038) 0.025 (0.0009) 

(0.0017) 0.025 (0.0012) 
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Figure 1: (a) The ratio of Z, p, |/3J'|) over 0, p, |/9J'|) for different I and p with 

h^/n = jn = 10, [logp/n]^/^ ^ = 0.25 and (31 = 1. A plus sign indicates the ratio is greater 
than 1.02, whereas a minus sign indicates the ratio is smaller than 0.98; hlled circles indicate an 
intermediate value, (b) The log-ratio of Tp^„(h^,/,p, |/9i|) over A — p2 for different I and p with 
h^/n = 10, [logp/nj^AA = o.25 and (3^ = 1. A plus sign indicates the log-ratio is greater than 
0.5 (ratio > 1.65), whereas a minus sign indicates the log-ratio is smaller than -0.5 (ratio < 0.61); 
filled circles indicate an intermediate value 
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Figure 2: Comparison of powers and type-I error rates of different testing methods, along with 
their 95% conhdence bands. Testing methods include LDPE (Zhang and Zhang, 2014; van de 
Geer et ah, 2014), ridge (Biihlmann, 2013), Gracel, Grace and GraceR tests. Filled circles (•) 
corresponds to powers, whereas crosses (x) are type-I error rates. Numbers on x-axis for Grace 
and GraceR tests refer to the number of perturbed edges (NPE) in the network used for testing, 
compared to the true network used to generate the data. 
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Figure 3: Results of analysis of TCGA prostate cancer data using the (a) Grace and (b) GraceR 
tests after adjusting for FDR at 0.05 level. In each case, genes found to be signihcantly associated 
with PSA level are shown, along with their interactions based on information from KEGG. 
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Figure 5: Number of genes identified by the Grace test in the TCGA data against the tuning 
parameter of the Grace test, ha- The red dashed line corresponds to the choice made by 10-fold 
GV {ho = exp(14.2)). 
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Figure 6: Number of genes that are significant using both the KEGG network and the perturbed 
network against the number of perturbed edges. The red dashed line represents the number of 
genes identified by the Grace test with the KEGG network. 
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Figure 7: Comparison of power and type-I error rates of different testing methods with their 95% 
conhdence bands. Testing methods include LDPE, ridge, Gracel, Grace and GraceR. Filled circles 
(•) show powers, whereas crosses (x) are type-I error rates. Numbers on x-axis for Grace and 
GraceR tests refer to the number of perturbed edges (NPE). 
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